AITopics | language technology

Collaborating Authors

language technology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generative AI Practices, Literacy, and Divides: An Empirical Analysis in the Italian Context

Savoldi, Beatrice, Attanasio, Giuseppe, Gorodetskaya, Olga, Manerba, Marta Marchiori, Bassignana, Elisa, Casola, Silvia, Negri, Matteo, Caselli, Tommaso, Bentivogli, Luisa, Ramponi, Alan, Muti, Arianna, Balbo, Nicoletta, Nozza, Debora

arXiv.org Artificial IntelligenceDec-4-2025

The rise of Artificial Intelligence (AI) language technologies, particularly generative AI (GenAI) chatbots accessible via conversational interfaces, is transforming digital interactions. While these tools hold societal promise, they also risk widening digital divides due to uneven adoption and low awareness of their limitations. This study presents the first comprehensive empirical mapping of GenAI adoption, usage patterns, and literacy in Italy, based on newly collected survey data from 1,906 Italian-speaking adults. Our findings reveal widespread adoption for both work and personal use, including sensitive tasks like emotional support and medical advice. Crucially, GenAI is supplanting other technologies to become a primary information source: this trend persists despite low user digital literacy, posing a risk as users struggle to recognize errors or misinformation. Moreover, we identify a significant gender divide -- particularly pronounced in older generations -- where women are half as likely to adopt GenAI and use it less frequently than men. While we find literacy to be a key predictor of adoption, it only partially explains this disparity, suggesting that other barriers are at play. Overall, our data provide granular insights into the multipurpose usage of GenAI, highlighting the dual need for targeted educational initiatives and further investigation into the underlying barriers to equitable participation that competence alone cannot explain.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.03671

Country:

North America > United States (0.67)
Europe > Italy (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)

Add feedback

Awal -- Community-Powered Language Technology for Tamazight

Öktem, Alp, Boudichat, Farida

arXiv.org Artificial IntelligenceNov-3-2025

This paper presents Awal, a community-powered initiative for developing language technology resources for Tamazight. We provide a comprehensive review of the NLP landscape for Tamazight, examining recent progress in computational resources, and the emergence of community-driven approaches to address persistent data scarcity. Launched in 2024, awaldigital.org platform addresses the underrepresentation of Tamazight in digital spaces through a collaborative platform enabling speakers to contribute translation and voice data. We analyze 18 months of community engagement, revealing significant barriers to participation including limited confidence in written Tamazight and ongoing standardization challenges. Despite widespread positive reception, actual data contribution remained concentrated among linguists and activists. The modest scale of community contributions -- 6,421 translation pairs and 3 hours of speech data -- highlights the limitations of applying standard crowdsourcing approaches to languages with complex sociolinguistic contexts. We are working on improved open-source MT models using the collected data.

artificial intelligence, natural language, tamazight, (18 more...)

arXiv.org Artificial Intelligence

2510.27407

Country:

Africa > Middle East (0.70)
Europe > Spain > Catalonia (0.14)

Genre:

Overview (0.68)
Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback

A Computational Framework for Interpretable Text-Based Personality Assessment from Social Media

Gjurković, Matej

arXiv.org Artificial IntelligenceOct-6-2025

Personality refers to individual differences in behavior, thinking, and feeling. With the growing availability of digital footprints, especially from social media, automated methods for personality assessment have become increasingly important. Natural language processing (NLP) enables the analysis of unstructured text data to identify personality indicators. However, two main challenges remain central to this thesis: the scarcity of large, personality-labeled datasets and the disconnect between personality psychology and NLP, which restricts model validity and interpretability. To address these challenges, this thesis presents two datasets -- MBTI9k and PANDORA -- collected from Reddit, a platform known for user anonymity and diverse discussions. The PANDORA dataset contains 17 million comments from over 10,000 users and integrates the MBTI and Big Five personality models with demographic information, overcoming limitations in data size, quality, and label coverage. Experiments on these datasets show that demographic variables influence model validity. In response, the SIMPA (Statement-to-Item Matching Personality Assessment) framework was developed - a computational framework for interpretable personality assessment that matches user-generated statements with validated questionnaire items. By using machine learning and semantic similarity, SIMPA delivers personality assessments comparable to human evaluations while maintaining high interpretability and efficiency. Although focused on personality assessment, SIMPA's versatility extends beyond this domain. Its model-agnostic design, layered cue detection, and scalability make it suitable for various research and practical applications involving complex label taxonomies and variable cue associations with target concepts.

large language model, machine learning, myer-briggs type indicator, (24 more...)

arXiv.org Artificial Intelligence

2510.02811

Country:

Asia (1.00)
North America > United States > California (0.45)
Europe > United Kingdom > England (0.27)
North America > United States > Minnesota (0.27)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
(3 more...)

Industry:

Health & Medicine (1.00)
Education (1.00)
Information Technology > Security & Privacy (0.92)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

What Do Indonesians Really Need from Language Technology? A Nationwide Survey

Kautsar, Muhammad Dehan Al, Susanto, Lucky, Wijaya, Derry, Koto, Fajri

arXiv.org Artificial IntelligenceSep-30-2025

There is an emerging effort to develop NLP for Indonesias 700+ local languages, but progress remains costly due to the need for direct engagement with native speakers. However, it is unclear what these language communities truly need from language technology. To address this, we conduct a nationwide survey to assess the actual needs of native speakers in Indonesia. Our findings indicate that addressing language barriers, particularly through machine translation and information retrieval, is the most critical priority. Although there is strong enthusiasm for advancements in language technology, concerns around privacy, bias, and the use of public data for AI training highlight the need for greater transparency and clear communication to support broader AI adoption.

artificial intelligence, chatbot, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.07506

Country:

Europe (1.00)
Asia > Indonesia > Sulawesi (1.00)
Asia > Indonesia > Borneo > Kalimantan (0.68)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

b96ce67b2f2d45e4ab315e13a6b5b9c5-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-18-2025, 05:50:15 GMT

computational linguistic, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Greece (0.29)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
(16 more...)

Genre: Research Report (0.69)

Industry:

Banking & Finance (1.00)
Law (0.68)
Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

C, Jesus Alvarez, Karajeanes, Daua D., Prado, Ashley Celeste, Ruttan, John, Yang, Ivory, O'Brien, Sean, Sharma, Vasu, Zhu, Kevin

arXiv.org Artificial IntelligenceMay-27-2025

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward inclusion. By establishing a foundation for Comanche in NLP, we advocate for computational approaches that prioritize accessibility, cultural sensitivity, and community engagement.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.americasnlp-1.4

2505.18159

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oklahoma (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (0.94)
Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The AI Gap: How Socioeconomic Status Affects Language Technology Interactions

Bassignana, Elisa, Curry, Amanda Cercas, Hovy, Dirk

arXiv.org Artificial IntelligenceMay-26-2025

Socioeconomic status (SES) fundamentally influences how people interact with each other and more recently, with digital technologies like Large Language Models (LLMs). While previous research has highlighted the interaction between SES and language technology, it was limited by reliance on proxy metrics and synthetic data. We survey 1,000 individuals from diverse socioeconomic backgrounds about their use of language technologies and generative AI, and collect 6,482 prompts from their previous interactions with LLMs. We find systematic differences across SES groups in language technology usage (i.e., frequency, performed tasks), interaction styles, and topics. Higher SES entails a higher level of abstraction, convey requests more concisely, and topics like 'inclusivity' and 'travel'. Lower SES correlates with higher anthropomorphization of LLMs (using ''hello'' and ''thank you'') and more concrete language. Our findings suggest that while generative language technologies are becoming more accessible to everyone, socioeconomic linguistic differences still stratify their use to exacerbate the digital divide. These differences underscore the importance of considering SES in developing language technologies to accommodate varying linguistic needs rooted in socioeconomic factors and limit the AI Gap across SES groups.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.12158

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.68)
Health & Medicine (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

Prottasha, Nusrat Jahan, Chowdhury, Upama Roy, Mohanto, Shetu, Nuzhat, Tasfia, Sami, Abdullah As, Ali, Md Shamol, Sobuj, Md Shohanur Islam, Raman, Hafijur, Kowsher, Md, Garibay, Ozlem Ozmen

arXiv.org Artificial IntelligenceApr-22-2025

Large models such as Large Language Models (LLMs) and Vision Language Models (VLMs) have transformed artificial intelligence, powering applications in natural language processing, computer vision, and multimodal learning. However, fully fine-tuning these models remains expensive, requiring extensive computational resources, memory, and task-specific data. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a promising solution that allows adapting large models to downstream tasks by updating only a small portion of parameters. This survey presents a comprehensive overview of PEFT techniques, focusing on their motivations, design principles, and effectiveness. We begin by analyzing the resource and accessibility challenges posed by traditional fine-tuning and highlight key issues, such as overfitting, catastrophic forgetting, and parameter inefficiency. We then introduce a structured taxonomy of PEFT methods -- grouped into additive, selective, reparameterized, hybrid, and unified frameworks -- and systematically compare their mechanisms and trade-offs. Beyond taxonomy, we explore the impact of PEFT across diverse domains, including language, vision, and generative modeling, showing how these techniques offer strong performance with lower resource costs. We also discuss important open challenges in scalability, interpretability, and robustness, and suggest future directions such as federated learning, domain adaptation, and theoretical grounding. Our goal is to provide a unified understanding of PEFT and its growing role in enabling practical, efficient, and sustainable use of large models.

large language model, machine learning, pattern recognition, (25 more...)

arXiv.org Artificial Intelligence

2504.14117

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.67)
North America > United States > Minnesota (0.28)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.65)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Energy (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(7 more...)

Add feedback

cantnlp@DravidianLangTech2025: A Bag-of-Sounds Approach to Multimodal Hate Speech Detection

Wong, Sidney, Li, Andrew

arXiv.org Artificial IntelligenceMar-16-2025

This paper presents the systems and results for the Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL) shared task at the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages (DravidianLangTech-2025). We took a `bag-of-sounds' approach by training our hate speech detection system on the speech (audio) data using transformed Mel spectrogram measures. While our candidate model performed poorly on the test set, our approach offered promising results during training and development for Malayalam and Tamil. With sufficient and well-balanced training data, our results show that it is feasible to use both text and speech (audio) data in the development of multimodal hate speech detection systems.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.07862

Country:

Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.05)
Oceania > New Zealand (0.04)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Unmask It! AI-Generated Product Review Detection in Dravidian Languages

De, Somsubhra, Vats, Advait

arXiv.org Artificial IntelligenceMar-12-2025

The rise of Generative AI has led to a surge in AI-generated reviews, often posing a serious threat to the credibility of online platforms. Reviews serve as the primary source of information about products and services. Authentic reviews play a vital role in consumer decision-making. The presence of fabricated content misleads consumers, undermines trust and facilitates potential fraud in digital marketplaces. This study focuses on detecting AI-generated product reviews in Tamil and Malayalam, two low-resource languages where research in this domain is relatively under-explored. We worked on a range of approaches - from traditional machine learning methods to advanced transformer-based models such as Indic-BERT, IndicSBERT, MuRIL, XLM-RoBERTa and MalayalamBERT. Our findings highlight the effectiveness of leveraging the state-of-the-art transformers in accurately identifying AI-generated content, demonstrating the potential in enhancing the detection of fake reviews in low-resource language settings.

computational linguistic, dravidian language, malayalam, (13 more...)

arXiv.org Artificial Intelligence

2503.09289

Country:

Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Bulgaria > Varna Province > Varna (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback